Stochastic calculus, non-linear filtering, and the internal model principle: implications for articulatory speech recognition
نویسنده
چکیده
A stochastic approach to modelling speech production and perception is discussed, based on Itô calculus. Speech is modelled by a system of non-linear stochastic differential equations evolving on a finite-dimensional state space, representing a partiallyobserved Markov process. The optimal non-linear filtering equations for the model are stated, and shown to exhibit a predictorcorrector structure, which mimics the structure of the original system. This is used to suggest a possible justification for the hypothesis that speakers and listeners make use of an “internal model” in producing and perceiving speech, and leads to a useful statistical framework for articulatory speech recognition.
منابع مشابه
Optimal filtering and smoothing for speech recognition using a stochastic target model
This paper presents a stochastic target model of speech production, where articulator motion in the vocal tract is represented by the state of a Markov-modulated linear dynamical system, driven by a piecewise-deterministic control trajectory, and observed through a non-linear function representing the articulatory-acoustic mapping. Optimal ltering and smoothing algorithms for estimating the hid...
متن کاملA non-linear filtering approach to stochastic training of the articulatory-acoustic mapping using the EM algorithm
Current techniques for training representations of the articulatory-acoustic mapping from data rely on arti cial simulations to provide codebooks of articulatory and acoustic measurements, which are then modelled by simple functional approximations. This paper outlines a stochastic framework for adapting an arti cial model to real speech from acoustic measurements alone, using the EM algorithm....
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملThe status of functional phonological information in statistical speech recognition
The choice of speech production models as a basis for Automatic Speech Recognition (ASR) is often taken to have two straightforward implications for the topology of recognition systems. Firstly, it establishes a set of articulatory properties whose elements are the basic linguistic units to be extracted from the signal. Secondly, it predefines an internal classification of these properties whic...
متن کاملOne-model speech recognition and synthesis based on articulatory movement HMMs
One-model speech recognition (SR) and speech synthesis (SS) based on a common articulatory movement model are described herein. The SR engine has an articulatory feature (AF) extractor and an HMM based classifier that models articulatory gestures. Experimental results of a phoneme recognition task show that the AF outperforms MFCC even if the training data are limited to a single speaker. In th...
متن کامل